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(57) Abstract 

The performance of a disk cache subsystem is en- 
hanced by dynamically sizing read requests based upon the 
™t cache hit rate. Acconlingly. the size of the n:ad re- 
quest depends upon at least one variable factor other than 
r size of the requested data. More specifically, the s.ze 
o the read request is reduced as the cache h.t rate declmes^ 
and the size of the read request is mcreased as the cache 
hU r fe increases. Short-term and long-term cache hit rates 
are tracked. The short-term cache hit rate is used to de- 
eline the r^uction in the size of the -ad requ-, and 
the long-term cache hit rate is used to detetmme the in 
crea e in the size of the read request. The read request 
is a read-around request formulated to obtain the mimedi- 
ateW requested data, plus additional data which .s not re- 
quested and which is located before and after the immedi- 
arely requested data. When used in a Windows operating 
system environment, the initial read-around value is about 
I megabyte. 



20 



APPLICATION 
PROGRAM 



22 



26 
28 



-TV 



DISK 
CACHE 
ENHAtJCER 



PERFORMAMCEL 
MONITOR 



READ 



TIME ' 
STAMP 



^ xfSECONDREAO 
^ REQUESTER 



SECOND 
DISK 
READ 



SECOND 

DATA 

(NOT 

RETAJNEO) 



24 



.12 



DATA 



CACI*: MANAGER 





SECOND ' 


SECOND 


FIRST 

DISK 

READ 




DISK 
READ 


DATA 



14 



FIRST 



DISK 



r 



Codes used to identify 



FOR THE PVRPOSES 

^„ ,„ the per on the front 
States party to tnc rv-» 



OF INFORMATION ONLY 

pages of pamphlets put,iishinginten..tionai 



applications 



under Oie ^^CT. 




Spam 
Finland 
France 
Gabon 

UnUed Kingdom 
Georgia 
Ghana 
Guinea 
Greece 
Hungary 
Ireland 
Israel 
Iceland 
Italy 
Japan 
Kenya 
Kyrgy^stan 
Democratic People's 
Republic of Korea 
Republic of Korea 
Kaiaksian 
Saint Lucia 
Liechtenstein 
Sri Lanka 
Liberia 



LS 
LT 
W 
LV 

MC 

MD 

MG 

MK 

ML 

MN 

MB 

MW 

MX 

NE 

NL 

NO 

NZ 

PL 

PT 

RO 

RU 

SD 

SE 

SG 



Lesotho 

Lithuania 

Luxembourg 

Latvia 

Monaco 

Republic of Moldova 

Madagascar 

The former Yugoslav 

Republic of Macedonia 

Mali 

Mongolia 

Mauritania 

Malawi 

Mexico 

Niger 

Ncllierlands 
Norway 
New Zealand 
Poland 

Portugal 

Romania 

Russian Federation 
Sudan 

Sweden 

Singapore 



SI 

SK 

SN 

SZ 

TO 

TG 

TJ 

TM 
TR 
TT 
UA 
UG 
US 

uz 

VN 
YU 
ZW 



Slovenia 

Slovakia 

Senegal 

Swaziland 

Chad 

Togo 

Tajikistan 

Turkmenistan 

Turkey 

Trinidad and Tobago 

Ukraine 

Uganda 

United States of An^'^'-'^^' 

Uzbekistan 

Vici Nam 

Yugoslavia 

Zimbabwe 



wo 99/34356 



PCT/US98/27417 



TITLE OF THE INVENTION 
DISK CACHE ENHANCER WITH DYNAMICALLY SIZED 
READ REQUEST BASED UPON CURRENT CACHE HIT RATE 



BACKGROUND OF THE INVENTION 
Cache is a storage area that keeps frequently accessed data or program 

instructions readily available so that data or program instructions (both referred to 
hereafter, as "data") used by a computer do not have to be repeatedly retrieved from a 
secondary storage area. In one typical scheme, cache is a form of random access memory 
(RAM) which can be directly and quickly accessed by the computer's processor. In 
contrast to cache, a computer also includes one or more secondary storage areas, typically 
disk devices, which can only be accessed through an input/output (I/O) device, thereby 
providing much slower access time than the cache. Ideally, a computer would run 
entirely from data stored in RAM-type cache. However, RAM is very expensive relative 
to disk memory. Thus, a small amount of cache is provided in the computer (relative to 
memory capacity of the disks) to improve overall performance of the computer. 

Fig. 1 shows aprior art computer system 10 which uses cache. The cache 
in Fig. 1 is "software cache" or "disk cache" which is designed to store data and program 
instructions which are frequently accessed from a disk drive or tape drive. The system 10 
is shown with an application software program 12 running and issuing I/O requests to a 
cache manager 14. The cache manager 14 retains a memory of data recently accessed 
from one or more physical disks, collectively referred to as disk 16, and which is stored in 
cache 18 within the cache manager 14. If the program 12 requests data which exists in 
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the cache 18. the data is retrieved directly from the cache 18 and no read request for the 
data is sent to the disk 16. However, if the requested data is not in the cache 18, the 
cache manager 14 issues an I/O read request to the disk 16, resulting in a seek and 
transfer of data from the disk 16 into the cache 18. Then, the cache manager 14 copies 
the data (now stored in the cache 18 for potential subsequent use) into the memory of the 
application program 12 (not shown) for immediate use. 

In one conventional system 10, the cache 18 is divided into "pages" and 
the data in the cache 18 includes pages of one or more application program instructions 

and/or pages of data used by the one or more programs. When the application program 
12 requests a page, and the page is not in the cache 18, a "page fault" occurs. Upon the 
occurrence of a "page fault," the cache manager 14 transmits a disk read request to the 
disk 16 to retrieve the page. The retrieved page is forwarded to the application program 
12 and is cached for potential subsequent use. 

The "cache hit rate" is a measure of the percentage of times that the 
requested data is available in the cache 18, and, thus, does not need to be retrieved from 
thediskl6. Disk drive life and program execution speed will improve as the cache hit 
rate increases, since read requests cause physical wear and since data access time from 
cache is typically significantly faster than data access time from a disk. Many schemes 
have been developed to optimize the disk cache process so as to minimize the number of 
seek and read requests for data stored on the disk 16. Some schemes affect how the cache 
18 is "populated" or "primed" with data. Other schemes are used to decide which data 
should be purged from the cache 18 as the space in the cache 18 becomes filled. Still 
other schemes are used to decide how to share valuable computer RAM between virtual 
memory and disk cache. U.S. Patent No. 5,581,736 (Smith), which is incorporated by 

reference in its entirety herein, is one example of the latter scheme. 

One conventional scheme to improve the cache hit rate is to pre-read 

additional, unrequested data whenever a disk read request occurs. More specifically, this 
scheme reads the requested data from the disk, as well as a small amount of additional 
data on the disk which follows the requested data. This scheme is based on the tact that 
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data which is physically stored on the disk after the requested data is oftentimes likely to 
be needed shortly after the requested data is needed. The amount of additional, 
unrequested data that is read from the disk is called the "read ahead size." One 
conventional disk caching subsystem provided in Microsoft Windows has a small, fixed 
read-ahead size which can be preset by the user up to a maximum value of 64 kilobytes 
(64 K). For example, if the read-ahead size is 64K and lOK of data must be retrieved 
from the disk because it is immediately needed and is not present in the cache, then 74K 
of data is retrieved and cached. The 74K of data consists of the requested lOK, plus the 
subsequent 64K of data on the disk. Likewise, if the read-ahead size is 64K and lOOK of 
data must be retrieved from the disk because the lOOK of data is immediately needed and 
is not present in the cache, then 1 64K of data is retrieved and cached. Some 
disadvantages of this scheme are as follows: 

(1) The maximum read-ahead size is very small, thereby limiting the amount of 
additional data that is pre-read into the cache for potential subsequent use. 

(2) The read-ahead size is fixed and thus cannot dynamically change based upon system 
performance. 

(3) The additional read data (i.e., read-ahead data) is always data which follows the 
requested data. In some instances, a program is likely to need data which precedes the 
requested data. In the conventional scheme, a separate disk read must be performed to 
obtain the preceding data unless the preceding data was coincidentally captured as part of 
the read-ahead data associated with a different prior disk read operation. 

Despite the many schemes for improving and optimizing disk cache 

performance, there is still a need to further improve and optimize performance, and thus 
fiirther reduce the number of disk read requests. The present invention fulfills this need. 

BRIEF SUMMARY OF THE PRESENT INVENTION 
A method is provided of reading data in a computer system, wherein the 
computer system includes a storage device and a cache in communication with the 
storage device. The method comprises tracking a cache hit rate of the computer system. 
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detecting a request for data which is immediately requested by the computer system but 
which is not curremly present in the cache, formulating a read request to obtain the 

requested data from the storage device, and dynamically sizing the read request based 
upon the current cache hit rate. The size of the read request is related to the cache hit rate 
5 in a manner such that the size of the read request is reduced as the cache hit rate declines, 

and the size of the read request is increased as the cache hit rate increases. Short-term 
and long-term cache hit rates are tracked. The short-term cache hit rate is used to 
determine the reduction in the size of the read request, and the long-term cache hit rate is 
used to determine the increase in the size of the read request. 

10 BRIEF DESCRIPTION OF THE DRAWINGS ^ 

The following detailed description of preferred embodiments of the invention 

would be better understood when read in conjunction with the appended drawings. For 
the purpose of illustrating the invention, there is shown in the drawings embodiments 
which are presently preferred. It should be understood, however that the invention is not 
15 limited to the precise arrangements and instrumentalities shown. In the drawings: 

Fig. 1 is a schematic block diagram of a conventional disk caching 

scheme; 

Fig. 2 is a schematic block diagram of a disk caching scheme in 
accordance with a first embodiment of the present invention; 
20 Fig. 3 is a combined functional flowchart and schematic block diagram of 

the disk caching scheme of the present invention; and 

Fig. 4 is a schematic block diagram of a disk caching scheme in 

accordance with a second embodiment of the present invention 

DETAILED DESCRIPTION OF THE INVENTION 
25 Certain terminology is used herein for convenience only and is not to be taken as 

a limitation on the present invention. In the drawings, the same reference numerals are 
employed for designating the same elements throughout the several figures. ( 

-4- 
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Fig. 2 is a schematic block diagram of a disk caching scheme in 
accordance with a first embodiment of the present invention. Referring to Fig. 2, system 
20 is similar in many respects to the conventional system 10 of Fig. 1 , except that the 
system 20 includes an additional clement, namely, a disk cache enhancer 22 (hereafter, 
5 DCE 22). The DCE 22 functions as an add-on device to the conventional system 10 and 

does not interfere with the normal operation of the conventional system 10. Instead, the 

DCE 22 generates additional commands in the form of second disk reads to the cache 
manager 14 to improve the hit rate of the cache 18. One conventional system 10 suitable 
for use with the system 20 is the disk cache subsystem of Windows 95, Windows 98, or 
10 Windows NT. 

The DCE 22 includes a time stamp 24 for incoming read requests; a 
performance monitor 26 for tracking a long-term cache hit rate, a short-term cache hit 
rate, disk transfer rate, average seek time, and other statistics; and a second read requester 
28 for initiating second disk read requests. 
15 The system 20 operates as follows: 

(1) When the application program 12 needs data, a read request is transmitted from the 
program 12. 

(2) The DCE 22 receives the read request, time stamps the read request, and forwards the 
read request unchanged to the cache manager 14. 

20 (3) The cache manager 14 processes the read request in a conventional manner, as 

described above. Thus, if the requested data currently exists in the cache 18, the data is 
retrieved directly from the cache 18 and is sent to the memory of the application program 
12 (not shown) for immediate use, and no read request for the data is sent to the disk 16. 
However, if the requested data is not in the cache 18, the cache manager 14 issues an I/O 

25 read request to the disk 16, resulting in a seek and transfer of data (referred to as "first 

data" in Fig. 2) from the disk 16 into the cache 18. Then, the cache manager 14 copies 
the data (now stored in the cache 18 for potential subsequent use) into the memory of the 
application program 12 (not shown) for immediate use. The read request transmitted by 
the cache manager 14 may also obtain additional data as part of the first data based upon 

-5- 
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the preset read-ahead size, as discussed in the background section above. 
(4) Shortly after execution of the operations in (3), the performance monitor 26 of the 
DCE 22 uses the read request time stamp and the arrival time of the data to detennine if 
the current read request resulted in a cache hit or cache miss. A."hit" is detected if the 
5 data transmitted to the program 12 arrives faster than a preset time period (i.e., if the read 

is completed quickly), and a "miss" is detected if the data transmitted to the program 12 
arrives slower than the preset time period (i.e., if the read is not completed quickly). The 
preset time period is based upon known cache access and disk access times. The "hit" or 
"miss" status, as well as the data arrival times are used to update the long-term cache hit 

1 0 rate, short-term cache hit rate, disk transfer rate, average seek time, and other statistics 

kept by the performance monitor 26. 

(5) If a "hit" is detected in the DCE 22, no further action is taken by the DCE 22 other 
than to update the statistics. 

(6) If a "miss" is detected by the DCE 22, the.DCE 22 formulates a second logical disk 
15 read from the second read requester 28 and issues it to the cache manager 14 to prime the 

cache 18 and improve the subsequent hit rate. The second disk read is "dynamically 
sized" based upon the current cache hit rate. The "current cache hit rate" is a moving 
average of the cache hit rate over time. "Dynamically sized" means that the size of the 
second read request depends upon at least one variable factor other than the size of the 
20 requested data which is inherently variable. The second disk read is preferably a "read- 

around", request formulated to include the data requested in the original read request, plus 
additional data which is not immediately requested and Which is located before the 
starting point and after the end point of the immediately requested data. The second read 
request thus differs in at least two significant ways from the conventional read request. 
25 First, the size of the additional data requested in the second read request is variable, 

instead of being fixed in the conventional scheme. Second, the data includes data located 
before and after the required data, instead of only after as in the conventional scheme. 
Furthermore, depending upon the current cache hit rate, the size of the additional data 
requested in the second read request will typically be significantly larger than the size of 

-6- 
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additional data requested in a conventional scheme, and thus may alternatively be referred 
to as a "Big Read" (see Fig. 3). For example, the second read request might ask for an 
arnount of data in the megabyte range, compared to a maximum of 64K in a conventional 
Windows scheme. In an alternative, but less preferred embodiment of the present 
5 invention, the second read request is a dynamically sized conventional read-ahead 

request, not a read-around request. 

(7) The second disk read request is received by the cache manager 14 and processed in 
the same manner as a conventional read request. That is, the cache manager 14 checks to 
see if the data in the second disk read currently exists in the cache 18. If so, no read 

10 request for the data is sent to the disk 16 and no farther action is taken by the cache 

manager 14 or the DCE 22. However, If all of the data in the second disk read is not 
currently in the cache 18, the cache manager 14 issues an I/O read request to the disk 16, 
resulting in a seek and transfer of data (labeled as "second data" in Fig. 2) from the disk 
16 into the cache 18 for potential subsequent use. The cache manager 14 also forwards 

15 the data to the DCE 22 (the requester) as part of its normal protocol. The DCE 22 

receives the data but does not store it. 

In most disk drive implementations of the present invention, the second 
disk read request usually results in a cache miss and a subsequent disk read, since it is 
unlikely that the cache 18 contains all of the typically large amounts of additional 

20 requested data. The subsequent disk read typically occurs efficiently because the 

mechanical arm of the disk drive is already at or close to the desired reading location due 
to the previously executed disk read of data associated with the original read request. 

As noted above, the dynamically sized second read request is based upon 
the current cache hit rate. More specifically, the size of the second read request is related 

25 to the cache hit rate in a manner such that the size ofthe read request is reduced as the 

cache hit rate declines, and the size of the read request is increased as the cache hit rate 
increases. Preferably, the size of the second read request is further dependent upon the 
short-term cache hit rate and the long-term cache hit rate wherein the short-term cache hit 
rate is used to determine the reduction in the size of the read request, and the long-term 
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cache hit rate , is used to determine the increase in the size of the read request. The short- 
term/long-term scheme produces a hysteresis response in the second read request size and 
allows for more stable and rapid adaptation. One suitable algorithm for determining the 
size of the second read request, as expressed in C programming language derived directly 
5 from the source code Appendix below, is as follows: 

int xnO; // current max readahead size from cache table 
int xn; // current readahead size 

int xnMax=Nmax; // current physical readahead buffer size 
// if less than xn and Nmax, multiple [prejreads are done qq[slower?] 
10 int xlen=l«Nmax, xmsk=-l«Nmax; char *xbuf=0; 

int fastaverage^O,slowaverage=0; // no misses 
#define MissShft 30 
#define one (l«MissShft) 

void xnset(int new_xn) {static int xnO__=0; 
15 if (xnO_!=xnMax) {xnO_=xnMax; xfree(xbuf); 



phys_buf_len= l«xnMax; 
xbuf=malloc(phys_bufJen); 

} 



20 




25 



int freePgs, unlockedPgs, cachePgs,chngs; 
int bigread,bigreads; 



inttab[]=(14000,21, 13000,20, 12000,19, 1 1000,18, 9000,17, 0,0}; 

-8- 
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// first entry of each pair is cache size required for second entry readahead 
// letting entry 1 to 0 always forces xnO to second entry 

// . ■ 

void check_cache_size() { int i,p; 

5 freePgs = GetFreePageCount(0,&unlockedPgs); 

cachePgs=VCache_GetSize(0,0); 

p=(cachePgs*4096)/K; 

for(i=0; tab[i]; i+=2) {if(p>=tab[i]) {xnO==tab[i4-l]; break;}} 
if (xnO<Nmin) xnO=Nmin; 
10 if (xnO<xn)xnset(xnO);//okto be greater 

V } . 

#defmeMissSzMax 10 

int max_miss[2][Nmax+l]; // max allowable miss rate for each readahead size 
int min_miss[2][Nmax+l]; // increase readcheck_cache_size size if below this 
15 intminMiss[MissSzMax],maxMiss[MissSzMax]; 
// above are decayed averages of misses 

int sensitivity=3; 

void doMiss(int m) ( int i,n=0; // m=0[hit?] or 1 [miss] 

fastaverage=fastaverage-(fastaverage»sensitivity); 
20 slowaverage=slowaverage-(slowaverage»(sensitivity+3)); 

if (m) {// reduce readcheck_cache_size possibly 

fastaverage += one»sensitivity; 

slowaverage one»(sensitivity+3); 

if (slowaverage>max_miss[0][xn]) ( 
25 if(xn>0) xnset(xn-l);// fight reducing 

) 

-9- 
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} 

else {// increase read ahead size possibly 

if (fastaverage<min_miss[0][xn]) xnset(xn+l); // quick increase 

} , 

5 , 

} 

void set_up_intemal_tables() {int i; 
for ( i=0; i<=Nmax; i++) {// calculate miss rate thresholds 
max_miss[0][i]=one/4+one/(l+i); 
10 min_miss[0][i]^max_miss[0][i]*2/3; 

) . ■ V" 

check_cache_size(); 
xnset(xnO); // 

} ' . " - . 

15 int ramspeed-1;// 10*2^19 bytes per millisec 

void calibrate_cache_copy_time(lenglh) {int n=l ; 
char *bufl-0; int tUt2; 
for (;length>=1024; lengtlV=2, n*=2) { 
if (!bufl) bufl=(char*)malloc(length); // lots of ram 

20 else { 

tl = getTimeQ; 

{int i=0; for( ; i<n; i-H-) {memcpy(bufl,bufl+length/2,length/2);} } 
t2 = getTimeQ; 
ramspeed=(l ength* n)/(t2-t 1 ); 

25 } 

} 

xfree(bufl); 

-10- 
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. } ■ :\ 

handle_the_io() { 
if (iMs_a_simple_read) { 

int sfn=pir->ir_sfn, pos=pir->irj30s; // save stuff 
5 int iolenv,len,pos2=0,len2=0,err, pid=pir->ir_pid; 

ioreq io;'pioreq p=io; 

io=pir; // copy the io request 

to = getTimeQ; // time stamp 

iolenv=pir->ir Jength; // attempted 
10 ret = (*PrevHook)(pfn, fn. Drive, ResType, CodePage, pir); // do the io 

dt2 = getTime()-tO; // elapsed time 

len=pir->irjength; err=pir->ir_error; 

dt0=l+2*len/ramspeed; // time it would take to copy from cache 
if ( dt2>dt0 <&& !err ) {// we have a miss 
1 5 pos2 = io.ir_pos&=xmsk; // adjust start for read_around 

if (xn>=Nmin) { //read more 
io.irjength = phys_buf_len; io,ir_data=xbuf; 
(*PrevHook)(pfn, fn. Drive, ResType, CodePage, p); // do bigread 

} 

20 len2=io.irJength; // what we [would have] reread in 

} 

check_cache_sizeO; 

doMiss(len2!=0); // update fast and slow averages, adju 
} 

25 else {//just pass it on 

ret = (*PrevHook)(pfn, fn, Drive, ResType, CodePage, pir); 

) 

-11- 
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When an application program 12 randomly reads small records in different 
areas of the disk 16, the hit rale declines and the pre-read size is reduced, eventually 
becoming zero. When the pre-read size is zero, the DCE 22 provides no performance 
improvement in the system 20. However, most application programs 20 read related data 
most of the time, and the present invention detects that situation and dynamically 

enlarges the pre-read amount to a large number (e.g., typically 1 megabyte) in 

comparison to conventional schemes, such as Windows built-in cache routine which 

allows for a fixed, user-selected pre-read size up to 64K. 

The present invention is based on the theory that if large reads are 

resulting in a high cache hit rate, then the system should continue performing large reads 
and should even increase the size of the reads. If the even larger reads farther increases 
the cache hh rate, then the system should try even larger reads, and so on. Likewise, if 
large reads are not resulting in a high cache hit rate, then the system should stop 
performing large reads, since the large reads consume system resources without providing 
any significant benefit. Simply stated, if the action has great results, do more of it, and if 
the action has poor results, do less of it. 

Upon initiation of the system 20, the pre-read size is preferably set to 
about 1 megabyte. Since the DCE 22 does not affect the normal operation of the cache 
manager 14, the cache manager 14 continues to pre-read data according to the user preset 
value, even if the pre-read size output by the DCE 22 becomes reduced to zero as a result 
of a long period of a very low cache hit rate. 

The read-around scheme preferably starts the read at a number of bytes 
which is the largest integral multiple of the read-ahead size that is less than or equal to the 
original I/O starting address. For example, if the original I/O starting address requested 
by the application program 12 is address 1,000,001, and the read-ahead size is currently 
0.5 megabytes, then the addresses which are read for caching purposes are; 1,000,000 to 
1 ,499,999. One advantage of this scheme is that the pre-read data always pieces together 
■ to create the desired file while minimizing overlapping portions.. 

Fig. 3 is a combined functional flowchartyschematic block diagram 30 of 

-12- 
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the disk caching scheme of Fig. 2. In view of the discussion above. Fig. 3 is self- 
explanatory and thus is not described in further detail herein. However, it is noted that 
the short-term cache hit rate is referred to in Fig. 3 as the FAST moving average, and the 
long-term cache hit rate is referred to in Fig. 3 as the SLOW moving average. 

5 PERFORMANCE RESULTS OF PRESENT INVENTION 

An important industry performance measure is the Ziff-Davis "WinBench" 
test suite, which is widely quoted when comparing the cost performance of various 
manufacturers' personal computers. The present invention improves the performance of 
the "Business WinDisk" section of the WinBench tests from 30 to 100 percent, depending 
1 0 on the available hardware resources. The most important resource is the amount of 

internal memory allocated for disk caching. More modem systems with faster clock 
speeds and faster disk transfer rates tend to show greater improvement in this regard due 
to the use of the present invention. 

Another important improvement is in reduced program loading time for 
15 large programs that are page faulted in. This is best explained in the context of Windows 

95, Windows 98 and Windows NT. Consider, for example, the start-up of a 1 megabyte 
program. First, the operating system places information about the whole program in the 
page table, but the program itself is not read into memor}'. The operating system then 
attempts to execute the first instruction, causing a page fault. The missing page (which is 
20 only 4K) is then read in from the disk cache. The program is allowed to run until another 

page fauh occurs, and the process continues until an initial "working set" of pages is in 
virtual memory. Thereafter, the program can run with a relatively small number of page 
faults. 

Significant time can be expended if the disk caching subsystem performs a 
25 physical seek and a read for each page being faulted in, Windows therefore reads in a 

small fixed amount (under user control up to 64K) of additional data and places that data 
into the cache in case it is required shortly by another page fault. In this situation, the 
present invention detects the cache miss and reads in a larger section of the program, 
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including parts of the program preceding the page fault, forcing more data into the cache. 
This has the effect of reducing the loading time of programs such as Netscape by half in a 
typical configuration. 

A further performance improvement is also caused by the present 
invention as a result of the improved hit rate. Windows also monitors the hit rate and 
adjusts the cache size, as described in U.S. Patent No. 5,581,736 (Smith). The present 
invention, when practiced with the Smith scheme, causes Windows to allocate more 
memory to the cache, thereby resulting in a further performance improvement. 

HARD DISK RELIABILITY IMPROVEMENTS 
RESULTING FROM USE OF THE PRESENT INVENTION 

The main failure mode of hard disks occurs during seeks. The present 
invention significantly reduces the number of seeks. For example, the Ziff-Davis 
benchmark previously mentioned contains snapshots of activity caused by common PC 
programs including MS-Office, Lotus, Excel, MS Word, PowerPoint and others. This 
activity contains about 52,000 reads which normally causes 12,000 seeks. The present 
invention reduces the seeks to 2,140 seeks. The reduced seeks translates into a 
substantial improvement in disk lifetime. 

The DCE 22 is preferably implemented as a software driver. The DCE 22 
may be installed as a device driver with any Windows 95, Windows 98, Windows NT 
operating system, or the like. 

The present invention is particularly useful in computer applications that 
make extensive use of disk reads. However, the scope of the invention includes systems 
wherein the disk 16 is another form of a storage device, such as a tape drive. More 
generally, the storage device may be any type of memory which is associated with a 
cache for the memory. 

SECOND EMBODIMENT WITH DCE FUNCTIONS 

INTEGRATED INTO DISK CACHE SUBSYSTEM/CACHE MANAGER 
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In the first embodiment of the present invention described above and 
illustrated in Figs. 2 and 3, the DCE 22 operates independent of the cache manager 14 
and thus is particularly suitable as an add-on or retrofit scheme. However, the functions 
of the DCE 22 would likely be performed more efficiently if they were integrated into the 
5 cache manager 14. Some advantages of an integrated scheme are as follows: 

(1) The performance monitor statistics and the cache hit and miss detection functions of 
the conventional cache manager 14 can be directly used for determining the size of the 
variable read request, in place of the indirect scheme of Fig. 2 which is used to obtain the 
statistics and detect cache hits and misses. Thus, the time stamp 24 and performance 

10 monitor 26 of Fig. 2 may be eliminated. 

(2) The first disk read can be dynamically sized based upon the algorithms described 
above. Thus, no second disk read or return of second data would be required and the 
second read requester 28 of Fig. 2 may be eliminated. 

Fig. 4 shows an integrated system 32. The system 32 is generally similar 
15 to the system 10 of Fig. 1, except that cache performance monitor 34 tracks short-term 

and long-term cache hit rates wherein a conventional cache performance monitor tracks 
only one cache hit rate. Furthermore, the cache manager 14' includes a pre-read size 
calculator 36 to determine a dynamically sized disk read wherein a conventional cache 
manager 14 outputs a fixed size disk read. 
20 Although the present invention is preferably used with software cache or 

disk cache, the present invention may also be used in conjunction with other types of 
cache, such as cache memory or memory cache. For example, hardware cache is cache 
memory on a disk drive controller or a disk drive. The hardware cache stores frequently 
accessed program instructions and data, as well as additional tracks of data that a program 
25 , might need next. A computer can access required data much more quickly from the 

hardware cache than from the disk. The data in the hardware cache is delivered directly 
to an expansion bus. A memory cache, sometimes called a cache store or RAM cache, is 
a portion of memory made of high speed static RAM (SRAM) instead of the slower and 
cheaper dynamic RAM (DRAM). In memory caching, data and instructions are cached in 
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SRAM to minimize the need to access the slower DRAM. Memory caches may be 
internal (Level 1 (LI)) or external (Level 2 (L2)), In a memory cache scheme, the 
"storage device" would be the DRAM. The scheme disclosed in the present invention 
may be adopted for all of the above-noted caching processes. 

The following Appendix is the source code for one suitable 
implementation of the first embodiment of the present invention. 

It will be recognized by those skilled in the art that changes may be made to the 
above-described embodiments of the invention without departing from the broad 
inventive concepts thereof It is understood, therefore, that this invention is not limited to 
the particular embodiments disclosed, but is intended to cover all modifications which are 
within the spirit and scope of the invention as defined by the appended claims. 
What is claimed is: 
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APPENDIX 

/* This VxD can output information to the debug console 

5/21 got 1200 with d21l2cl9ml6. each miss caused xn to drop &om 19 
cache grew from about 10 to 14 during ran 

5/22 got 1350 with fixed multiple read code 
got 1200 with c20m 17 about 2.5 times the misses [2500] 
reduced cache params gave 1300, 1450, 1550, 1640[log 800 misses] 
old [shipped to corapaq, asus] driver gives 900 1200 on first 2 runs , 

8/30 noticed we perform poorer for 'application development' tests 
note that link libraries are large files that contain small binaries 

9/1 2055 non cached opens found in zifTdavis 
zd got 612,775,1030,1290,1480.1550,1650.... cache from lO lo 19 

V 

// .„„. 

// Device preliminaries 

#define DEVICE^MAIN 

^include <vtoolsc.h> 

Declare__Virtual_DeviceClOS0) 
^undef DEVICE^MAIN 

//^include "apcx.h" // definitions common to app 

// APCX.H - include file for Asynchronous Procedure Call example 

// These definitions are used both by the calling app and the VxD 

^define APCX^REGISTER CTL_C0DE(FILE_DEVICE^UNKNOWN, 1, 
METHOD_NElTHER. FILE^ANY^ACCESS) 

^/define APCX_RELEASEMEM CTL_CODE(FlLE^DEVICE_UNKNOWN, 2, 
METH0D_NE1THER, FILE_ANY_ACCESS) 



// — . 

// Static data 

PVOID OpenFileApc = 0; // ring 3 address to call 

THREADHANDLE TheThread = 0; // thread in which ring 3 call runs 

ppIFSFileHookFunc PrevHook; // previous IFS hook 

int found_BIOS;7/ if 1. we found the appropriate bios signature 

int debugging=0; // if >0 emits outO's to debugger and optional log file 

int prercad=l ; // 0 means no preread 

— " 

// Declare prototypes for control message handlers 
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DefineControIHandler(SYS_DYNAMIC_DEVICE_rNIT, OnSysDynamicDevicelnit); 
DefineControIHandler(SYS_DYNAMIC_DEVlCE_EXIT, OnSysDynamicDcviccExii); 
DefjneControlHandlcr(W32_DEVICEIOCONTROL. OnW32Deviceioconrrol); 
DefmeControlHandler(DEVICE_INIT, OnDevicelnit); 

— — - - .„„ 

// Routine to dispatch conlrol messages to handlers 
// 

BOOL CoDtrolDispatcher( 

DWORD dwControIMessage, 

DWORD EBX, DWORD EDX, DWORD ESI, DWORD EDI, DWORD ECX) 

{ : 

START_CONTROL_DISPATCH 

ON„SYS_DYNAMIC_DEVICE„INIT(OnSysDynamicDeviceInii); 

ON_SYS_DYNAMlC_DEVICE_EXIT(OnSysDynamicDeviceExit); 

ON_W32_DEVICEIOCONTROL(OnW32Deviceiocontrol); 

0N__DEVICE_INIT(OnDeviceInit); 
END_CONTROL_DISPATCH 
reuim TRUE; 
.} ■" 

///SdefineXDEBUG 1 

#ifdefXDEBUG // for dynamic testing with the ape 

^define xtrapO trapO 

#dcfinc breakPointO asm int 3 

^define APC(s) {if (TheThread) {_VWIN32_QueueUserApc(OpcnJFileApc, CDWORD)cpySir(s), 
TheThread);}) 

// qq this must be a dynamic dll to call the APC 
#else 

SdefmextrapO {} 

^define breakPointQ {if (debugging) _asm int 3} 
^define APCCs) {} /* dent call if we're static */ 
#endif . 

void checkfilenajne(piorcq pir); 

^define enterSyncQ Begin_Critical_Section(0) 
^define exitSyncQ End_Critica]_SectionO 

voidx&ee(void *x) {if (x!-0) {fTee(x);}} 
void trapO 0 

^defme getTimeQ Get_System_TimeO 
int xTime() {int date,time; 
time=VTD Get Date And Time(&date); 
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return date*(24*60 + 60)+time/1000; // seconds since 

chars[150]; 
ints_length=0; 

int flushalways=0,flushnow=0,flushing=0,logging=0; // to file 

#derme logbufsize 49000 
char *logbuf; 
int log_bufjength=0; 
int time.iimeO; 

void out(char *stT) {char si [160]; 
if (debugging) {int tirne=gelTimeO; 

int sec=(time-tinaeO)/1000; 
// inttenths=(time-timeO-sec*1000)/100; 

int hundiedths=(time-timeO-sec* 1 000)71 0; 

int milliseconds=time-timeO-sec* 1 000; 

int sl_lcngth=sprintf(sl/'%d.%03d %s\n",sec,milliseconds,str); //prepend time 
APC(str); // called iff XDEBUG 

dpnntf("%s",sl); // pass to debugger [some are lost?qq] 
if (logging) {int push]og,pushflush,pushbug; 
// enterSync(); //qq crashes on Alt tab. can these be nested? 

if (si Jength+log_bufJength>logbufsize-4 |1 flushing || flushnow) { 
static n=0; 
charfile[16]; 

sprintf(file;'c:\\gl#%d.log",n++); 
pushbug ^debugging; debugging=0; 
pushflush=flushing; flushing=0; 
pushlog = logging; logging=0; 
if (log_buf__length) 

xWrite(filc,logbuf,log_buf_length); 
debugging=pushbug; 
flushing=pushflush; logging=pushlog; 
log_buf_length=0; 
fl us hno w= fl u sh al way s ; 
if (flushing) {xfTee(Iogbuf); logbuf=0; 

logging=flu5hing=0; 

return;} 

} 

log_bufJength += spnntf(logbuf+log_buf_length,"%s",sI); 
// exitSyncQ; 
} 

} 

} 
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char *cpyStr(char*sO) {char* w.v, *^ 

#dcfine Preset -1 ^^^W(s.sO); return s;) 

^define Fopcn 0 
^define Fread I 

i^define Fwrite 2 
^define Fclose 3 

void fNa^fintf.intn. char -p i^,,^,,, 
static int tabS2=0; static char ^ . 

vffn>=tabS2>--(inti.s20=nabS^- ^ '^'^^ } 



switch (f) { 



case Fopen: xireeCfTabfnlV m>,r„i 

case Fr.„.: "'"^'^ P[»W*[n)-=0; w^OJ-wTabM-O, break- 



if(rrab!=0) {int i; 
for(i=0;i<tabS2;i-H-) { 



/»Slkj?r"-'---'«-«^)==o).-.frabro), 

xfTee(fTab); xfl-ee(wTab)- 
} break; 

} 

^exitS>.cO;////can-tputrett^onthis,i.eMSbugn! 

int to-];// last courit emitted 
ffdcfincIOl 

#ifdef 10 

mtxWrite(char*f.void*d,intnW 

HANDLE h;intrr.=0; wbR^lUvTE act- 
h R0-.OpenCreateFile(FALSE f 
/*mode*/ 
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); 

Rn%T'^?:?i'^^^'-^^"^'^'"-°-'^^-); .0 = offset 
KO_CIoseFile(h,«feen-); . , unset 

return m; 

) ' 

/♦ 

inthist[100]; 

void WstlnitO { int i; for(i==0:i<100;i-^) {histf^^^^^^ 
void histlnc{int i_) { > J X'^^iuj v,} ) 

int i=(i_>99)? 99: (i_<0)? 0- i • 
hist[i]-H-; - -' 

} \ ^ ' 

void histOutO { int i J; static char a[32j; Str h=newStrf""V ref^M- 
fr'else 

void sout(char's) { dprintf("°/os",s)- } 
void xsoutQ {} 

void histlnitO {} 
void histlnc(int i_) {} 
void histOutQ {} 
#endif 

char* gctPath(int d, pioreq nir)( 
_QWORDx; 

//divide by 2 from "short" to "char"; 3: "C:" ... "/OOO" 

int sz=(pir.>ir_ppath->pp_totalLength» 1 )+3 - 
char *p=(char ♦)malloc(sz): 
if(p!=NULL){ 
p[OJ='A'-l+d;p[l ]=•;'; 

.^UniToBCSPath(p+2.pir->ir_ppath->pp_dements,s2-3.BCS_OEM.&x); 
return p; 

} 

^define Nmax 1 9 /* max allowable single readahead size [1 9=5 1 2k] V 

intNmin=14;//ininreadahcad [14=16384] 

int xnO; // current max readahead size from cache table 
mt xn; // current readahead size 

int xnMax=19; // current physical readahead buffer size 

// if less than xn and Nmax, multiple [prejreads are done qq[slower?] 
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«defineMis5Shft30 ^ ■ """"""^ 

#defineone(l«MissShft) 
void seeMissQ; 

void xnset(int new_xn) {static int xnO =0- 

Phys^bufjen= l«xnMax- 
//:..lloc^aBoveHappcnsat,,,-^^^^^^ 

if (nevv_xn>xnO II new_xn<l) return- 

: 1^ 

. mtfrcePgs. unlockedPgs.cachePgs; 
void aheadO { int i.p; 

ff £oiS^^-^f {'■f(p>tab[i]) {xn0=tab[i4-lj. break-}) 
^ if (xnO<xn) xnset(xnO); // to be greater 

^^define MissSzMax 1 0 

//haJf= 50%missRatio; one/4 = 25%- 

int sensitivity=3; 
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#defmeMis5Shft30 ^'^^'^ ^- ^^sscs 

*!dcfineone(l«MissShft) 
void seeMissQ; 

{static int xnO=0; 
phys_bufjen= l«xnMax- 

if (new_xn>xnO I) new_xn<l) return- ' 
intfrcePgs.unlockedPgs.cachePgs- 

void aheadQ { inti.p; 
pagcsQ; p=(cachePgsM096)/1000- 

^ If (xnO<xn) xnset(xnO); // ok to be greater 
#defmeMissSzMax 10 

// haJf^ SOyomissRatio; one/4 = 25%- 

// above d„.ycd averages of ™s.es wiftou. cone JSue^i 
int sensitivity=3; 
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ir (m) { //reduce readahead? ^'L'viiy+j;j_ 

fastaverage += one»scnsitivity 

slowaverage += one»(sensitiviiy+3)- 
^ If (sIowaverage>max_miss[xn]) { ' 
/ .f(xn=Nniin) {}//qqtumoff? 

// ./'^«='^nsct(xn-l);// fight reducing 

else { // increase readahead? 
: if(fastaverage<min^^^^^^^ 

/**• original method 
for(i=0;i<missS2;i-H-) ( 
int n=aMlss[i]; 

aMiss[i]=n=n-(n»i)+(ni«(MissShft-i))- 
if (n<minMiss[ij) {minMiss[i]=n;} 
^ if (n>maxMiss[i]) {maxMiss[ij=n;} 

^ for (i=missSz- ] ;i>=0;i-0 { if (aMissf^^^^^ 

^xnset(xnU-2 n). // change preread [qq 2?] 

void seeMissO {int i; 
// for(i=0;i<missS2;i-M-) 

(sprintf(s-M 0'i."o/.4d.%4d,".(™nMiss[i]»20)* ] 000/l024.(maxMiss[i]»20)^ i 000/i 024);) 

sprintf(s;'%d,%d %02d[o/od]>.o/o02d :%02d>.%02d^c^t,miss,max miss[xnl/S xn 
slowaverage/S.fastaverage/S, miii missrxD]/S)- -n"ssixnJ/S,xn, 
out(s); ~ ' " 

} - 

void setMissQ {int i;for(i=0;i<missSz;i++) {niinMiss[i]=l<<MissShft^ i > 

vo,d rcsetM:ssO [im i; for(i=0:i<missSz;i^) {aMiss[i]io; }; setSSo" ^ ' 

int mems2=0; void ♦membuf; 
void mcmtstO {static int sz=0; 
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if (S2!=memsz) { sz-memsz; xfree(membuO: 
membuf=malloc(mcmsz); 

if(membuf=0) {sz=iTiemsz=0:} 
}} ■ ■ 

int eof=0; 

int cache__always~0; 

void clearQ {int i; // caji't call at init time 

time=timeO=getTimeO; tO=0; 
for ( i=0; i<=Nmax; i++) (// calc miss rate thresholds 
// these should depend on disk seek time versus transfer time ■ 
// we could test for each drive and have a Uble for each drive qq 
max_miss[iJ=one/4+one/(l+i); 
min_missi;i]=max_miss[i] ♦2/3; 

aheadQ; // check cache size 
xnset(xnO); reselMissQ; 

#dcfmeK 1000 
#define M (K*K) 

int dt=250; // delta count to emit data 
int tcst_init=l; 

int offset=0; 

void show_data() { 

if (testjnit && debugging) (test_iiiit=0: 

if (found_BIOS)out("Authorized"); else out ("Unauthorized")- 
// spnntfCs/'offset %x".offset); out(s); 

outreofxnO xn ctime.mtime.ptime cnt.miss cread,niread,pread cache.frce.unldcked"); 

// if (reset) cIcarO; 
if (tO+dt<=cnt) {t0=cnt; 

sjength=sprjmf(s/'%d %d.%d %d.%d//od %d,%d %d.%d>d %d %d %d" 
,cof ' 

,xnO,xn, ctime/K.mtime/K,ptime/K 
.cnt.miss, cread/M,niread/M,pread/M 

.(cachePgs*4096)/IGTC.(frccPgsM096)/K/K.(unlockedPgsM096)/K/K 

)> 

++line; 

out(s); 

//seeMissQ; 

setMissO; // resets min and max only 
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} ■ 

^define NDRV 32 

int drv[NDRV]; //0, 1,2 assumed 0 

void driveTestQ { 

WORD S2Sect=0,sectsClust=0,freeCIust=0,totClust-^ 
int i; 

for(i=0;i<NDRV;i++) {drv[i]-0;} 
fbr(i-3;i<NDRV;i++) { 

R0_GetDiskFreeSpace(CBYTE)i,&sectsClust,&freeClust,iS:s2Sect,<S:totClust &err)- 
if(err=0) { 

char *az="_ABCDEFGHIJKLMNOPQRSTlJVWXYZabcder,*yn=''NY''; 

int csz=s2Sect*sectsCIust; 

if (fTeeClust>0 && szSect==5i2) { drv[i]=l;. } 

sprintf(s,"drive=%c %c, tolal=%04dM, free==%04dM'\ 
a2[i],yn[dr;'[i]]XtolClust*csz)/1000000,(fTeeClust*csz)/1000000);out(s^ 
:".}."■ . 
}■■■ ■ 

} ■ ■ ■ . ■■■■■■ 

int rainspeed=l; // 10*2'^! 9 bytes per millisec 

void calibrateQ {char s[20]; int n=10, xlen=l«Nmax; 

char ♦bun=(char*)malloc(xlen)»*buf2; int tl,t2; 

if (Ibufl) return; 

buf2=inalloc(xlen); 

if (!buf2) {xfree(bufl); return;} 

tl =getTimeO; 

{int 1=0; for( ; i<n; i++) {memcpy(bun,buf2,xlen);} } 
t2 = getTimcO; free(bufl); fTee(buf2); 
if (t2=nl) ramspeed-99999; //qq log this event 
else ramspeed=(xlen**'n)/(t2-tl); 

if (debugging) {sprintf(s/'RAM speed= %d",ramspeed); out(s);} 

if (rainspeed=0) ramspeed=l; //qq log this 

driveTestQ; 

.} 

int test_bios() {//char s[8]-"AMIbios"; 
chars[8KASUSTeK"; 
char ♦t=(char *)0xfDOOO; 
int i; 

for (i=0;i<0xfff9;i++) {if (*(t4-i)— s[0] && *(t+i-M)=-s[l] 
&& *(t+i+2)— s[2] 
&& *(t+i+3)==s[3] 
&& *(t+i+5)=5[5] 
*(t4-i+6)=-s[6] 
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*(t+i+7)=-=s[7] 

♦(i+i+4)-=s[4J) {offsetH; reium 1 ;} 

return 0;} 



/• .... ..... 

File System API Hook 



This function is installed to hook all flic system caJls. Each rime a 

file is opened, it allocates memory 10 store the name of the file 

being opened, forms the file name string, and queues an APC. passing - ' 

the address of the file name string. 

When the APC runs, it passes the address of the file name back to the VxD 

through DcvicelOControl in order to allow the VxD to deallocate the 

memory used to store the file name. /: - 

♦* Values for ir_flags for VFN_OPEN: fVom ifs.h 

// ACCESS_MODE_MASK 0x0007 /♦ Mask for access mode bits ♦/ > 
// ACCESS_READONLY 0x0000 /♦ open for read-only access ♦/ 

// ACCESS_WRITEONLY 0x0001 /♦ open for write-only access V 
// ACCESS_READWRITE 0x0002 /* open for read and write access ♦/ 
// ACCESS_EXECUTE 0x0003 /* open for execute access ♦/ . 

// SHARE_MODE_MASK 0x0070 /♦ Mask for share mode bits •/ 

// SHARE_COMPATlBILITy 0x0000 /♦ open in compatability mode */ 

// SHARE_DENYREAD WRITE 0x0010 /♦ open for exclusive access +/ 

//SHARE_DENYWRITE 0x0020 /• open allowing read-only access ♦/ 

// SHARE_DENYRJEAD 0x0030 /♦ open allowing write-only access */ 

// SHARE_DENYNONE 0x0040 /* open allowing other processes access ♦/ 

//SHARE_FCB 0x0070 /♦ FCB mode open .*/ 

//♦* Values for ir_options for VFN_OPEN: • 

// ACTION_MASK Oxff /* Open Actions Mask •/ 

// ACT10N_0PENEXISTING 0x01 /» open an existing file ♦/ 

//ACTION_R£PLACEEXISTrNG 0x02 /♦ open existing file and set length ♦/ 

// ACTION_CREATENEW 0x10 /♦ create a new file, fail if exists */ 

// ACTION_OPENALWAYS 0x1 1 /♦ open file, create if docs not exist */ 

// ACnON_CREATEALWAYS 0x12 /♦ create a new file, even if it exists ♦/ 



Alternate method: bit assignments for the above values: */ 
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//ACT10N_EXISTS_OPEN 0x01 // BIT: If file exists, open file 
//ACTION_TRUNCATE 0x02 // BIT: Truncate file 

//ACTION_NEXISTS_CREATE 0x10 // BIT: If file does not exist, create 

/* these mode flags are passed in via ifs_options to VFN_OPEN */ 

// OPEN_FLAGS_NOINHERIT 0x0080 

//OPEN_FLAGS_NO_CACHE R0_NO_C ACHE /• 0x0100 */ 
//OPEN_FLAGS_NO_COMPRESS 0x0200 

// OPEN_FLAGS_ALlAS_HlNT 0x0400 
//OPEN_FLAGS_NOCRITERR 0x2000 

//OPEN_FLAGS_COMMIT 0x4000 

//OPEN_FLAGS_REOPEN 0x0800 /' file is being reopened on vol lock V 

/•♦ Values returned by VFN_OPEN for action taken: ♦/ 
// ACTrON_OPENED 1 /* existing file has been opened ♦/ 

//ACTION_CREATED 2 /♦ new file has been created */ 

//ACTION_REPLACED 3 /* existing file has been replaced */ 

int_cdecl tfsHookQDiFSFunc pfii, int fri, int Drive, int ResType, 
int CodePage, piorcq pir) { 

int ret=0, sfTi=pir->ir_sfn, opt=pir->ir_options; char *path=0; 
if (ResType==IFSFH_RES_LOCAL && fn==IFSFN_OPEN) { 
enters yncO; checkfilename(pir); cxitSyncQ; 

) ■ " ■ ■ . 

if (ResType=IFSFH_RES_LOCAL && fn=IFSFN_READ && opt==0) { static int reset=l; 
if (reset) {resei=0; clearQ; calibrateO;} 

) ■ , 

////READ 8l opt==0 avoid maJlocs on paging operations 
//qqifDrive>32?? 

if (drv[Drivc] && ResType=lFSFH„RES_LOCAL && fh=IFSFN_READ && opt==0) { 
inl lUt2,t3.dtO,dt2,dt3,spd; 
int sfn=pir->ir_sfh, pos=^ir->ir_pos; 
int lenO,lcn,pos2==0,leQ2=0,crT, pid=pir->irjpid; 
static ioreq io; static piorcq p=&io; 
enterSyncQ; 
p[0]-pir[0]; 
tl - getTimeQ; 
lcnO=pir->ir_length; 

ret = (*PrevHook)(pfb, fn, Drive, ResType, CodePage, pir); 
t2 = getTimeQ; dt2=n2-tl; 
len==pir->ir_length; err=pir->ir_error; 
dt0-]+2*len/ramspeed; // ram to ram copy time 
//qq if length read in < length attempted (eof) , no preread 
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if ( dt2>dt0 &5: err==0 lcn!=12 ) { 
int at_cof=0; 

if (len!=lenO) {-Hfeof; 4H-at_eof^} 
// if (lenl==lenO && pos2==pos) dont waste Ume rereading? [about 3% of zd*^^^^ 
pos2 = (p->ir_pos<S:=amsk); p->irjeng:th==phy5_buf_len; p->ir_data=xbuf; 
if (preread && xn>=NrTun) { 

{♦PrevHQok)(pfh, fh, Drive, ResType. CodePage, p); 
pread p->ir_length; 
if (xlen>phys_buf_len) { //qq [slower!!] 
p->ir_pos += phys_buf_len; // next seg 
(*PrevHook)(pfB, fo. Drive, ResType, CodePage. p); 
prcad += p->ir_length; 

-■■ }."-. ■. ■ 

.}.. ■ ^ _ . . ^■ 

lerL2'=p->ir_length; //what we [would have] reread in qq 
miss-H-; mread+=len; mtime+=dt2; 

■} " ■ ■ : 

t3 = gctTimeQ; dl3=t3-t2; 

ctinie+=dt2; ptime+=dt3; cnt+-t-; cread+flen; time=n3; 
aheadQ; doMiss(len2!=0); 
if (debugging) show^dataQ; 
exitSyncQ; 

ret = (*PrevHook)(p£n, fn, Drive, ResType, CodePage, pir); 

.} , ■ - " 

#ifO 

//don't execute this 

if (0 && ResType=IFSFH_RES_LOCAL && !(fa=IFSFN_WRlTE opt!-0)) { 
int we=0; 
enterSyncQ; 

switch (fn) { // Branch on file system operation 
case [FSFN^OPEN: 
path=getPath(Drive,pir); 

if (pir->ir_erTor=0) {fNam(Fopen,sfn,&path,&:we);) 
xfrecCpath); 

fNam(Fread,sfh,&path,&we); // means ?? qq 

break; 
case IFSFN^READ: 

{Nain(Fread,sfh,&path,&we); 

break; 
case IFSFN_ WRITE: 

fNam(Fwrite.sfn,&path,&we); 

break; 
case IFSFN^CLOSE: 

////fNam(Fread,sfn,&path,&we); if (path!=0 && we!=0) {out(pa*^^^^^ 
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fNamCFclose,sfti,&path.&we); 
default: break; 

exitSyncQ; 

} ■ 
^endif 
return ret; 



// Handler for W32_DEVICEIOCONTROL 

, ■ 
// This function is called when 

// 

// (1) The app calls CreateFile 

// (2) The app caJls DevicelOControl 

// (3) The app exits (or calls CloseHandle) 

DWORD OnW32DeviceiocontrolCPIOCTLPARAMS p) { DWORD status; 
// Stmclure member dioc^IOCtlCodc detenmines function, 
switch (p->diocJOCtlCode) { 
case APCX_REGISTER: 
// When the app registers, grab the APC fanction address from the input 
// input buffer. Store the current ring 0 thread handle. 
OpcnFileApc = *(PVOID*)p->diocJnBuf; 
TheThrcad - Get_Cur_Thread_Handle(); 
case DIOC^OPEN: // CreateFile 

case DI0C_CLOSEHANDLE: //file closed 
status = 0; break; // return OK 
case APCX_RELEASEMEM: 
//The APC function calls DevicelOControl when it is done with the file name 
// that was passed to it. The VxD frees the memory that was earlier 
// allocated. 

fr ee(* (P V DID ♦ )p->dioc_InBuO; 
status = 0; break; 
default:// Fail any other calls, 
status = OxffffEfff; 

}, 

return status; 

) 

int cnablc=0; 
void initQ { 

7/12/31/96 6209; 6/1/97 = 6361 

enable = ((xTime()/86400)<6361+30+31+31+30+31+304-31)? 1: 0; //dec3l 
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found^BIOS = test_bios(); 
if (found_BIOS) enable=-l; 
//OpenFileApc = 0; 
if (enable) { 
//breakPointQ; 

PrevHook = IFSMgT_InstallFileSystemApiHook(ifsHook); 

} 

>. ■ , 

void * waste; // for testing reduced memory behavior 
void exitQ { 
fNam(Freset,0,0,0); xfree(membuf); 

xfree(waste); / 
OpcnFileApc - 0; 

if (enable) { ^ 
xfree(xbuO; 

IFSMgr_RcmoveFileSystemApiHook(ifsHook); 

BOOL OnSysDynamicDevicelnitO { init(); return TRUE; } 
BOOL OnSysDynamicDeviceExitO { exitQ; return TRUE; } 

BOOL OnDeviceInit(VMHANDLE hVM. PCHAR CQminandTaiI){ initQ; return TRUE; } 

int getPath2( pioreq pir, char *p. int max) { 
_QWORDx; 

//divide by 2 from "short" to "char"; 

int sz=(pir->irj)path->pp_totalLength»l)+l; // include NUL at end 
if (sz>max) sz^max; // truncate strange name 

UniToBCSPath(p,pir->irj)path->pp_clements,s2,BCS_OEM,&x); . 
return sz; 

) ^ 

char *p; 
int value; 

int gctpairO { //parse capital letter, optional signed number 
int ch=*p-H-,sign=0; 
value=0; 

if (ch-='-') {++sign; ch=*p4-f;} 

if(ch>='A'&&ch<='Z') 

while (♦p>='0' && *p<=*9') {value*=10; value+=*p-H--'0';}; 

if (sign) value=-valuc; 

return ch; 

} 
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void checkfilenaineCpioreq pir) { int ch; 

stiiticchar tcstname[3]- '#Gr*; // suffix is a control input 
char path[128]; int length; 
if (reset) clearQ; 
length=^getPath2(pir,path,127); 
-H-opens; 

p=path+l;//skip V 

if (debugging>l) { , 
int opt=pir->ir_options; 

if (*p != II (opt>2 && opt!=:0xll2 && opt!=0xl2 
opt!=OxlO) ) 

{sprintf(s,"open#%d %s %x",opens,path,opt); out(s);} 

■ )■ ■ ■ ■■ ■ 

if (pir->ir_options&OPEN_FLAGS_NO_CACHE) {++uncached; 
if (cachc_always) pir->ir_options 0x100; 

^- " }: 

if (*p-M-==nestname[0] && ♦p++=tcstnaine[l] && 
♦p-Hf-===testnaine[2]) {int done=0; 
// rest of filename is our input 
while (!done) switch (ch=getpair()) { 
case 'B': cachc_always=value; break; 
case 'C: 

// cO says use normal cache table 
// cn sets upper preread amount 
// c:>type \#glrlv/0c5 to force 512k always 
// c:>type \#glw7c0 to restore 
tab[0]=99000; // assume 
if (value<=Nmax) {// ignore cache size 
tab[0]=^l; lab[l]=-value; 

} .. . / ■ 

case 'D'; debugging=^^alue; break; //emits dprintf [log] 

case 'L': //write to \gl#N.log 

if (lvalue && logging) {flushing^!; 

break;} 

if (Hogging) Iogbuf=mal]oc(logbufsize); 
if (llogbuf) value=0; 
if (value>l) dt=value; 
test_init=l; 
logging=value; break; 
case 'M': if (value<=Nraax) { 
Nmin=value; 

if (xn<Nmin) xnsct(Nmin);} 
break; 

case *Q': calibrateQ; break; 

case 'R': preread=^alue;//rO means no preread 
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break; 

case 'S*: sensitivity^value; break; 
case 'V; // view data now 

5printf(s,"%d non cached",uncachcd); out(s); 

flushalways=^alue; 

flushnow=l; 

t0=-9999; show_data(); break; 
case 'W; missSz=value; break; 

// WO eliminates adaptation, W7 is max 
case 'X': // allocate memory of value*megs 

xfree(waste); 

if (value) {wastc=manoc(value«20); ^ ^ ^ z . 

if (! waste) outC'not enough free mem"); ^ 

■ ■■ ■ " y ■■■■ ■■-^^^v-/-^^ 

else waste=0; 
. : break; . 
case 'Z': clearQ; break; 
default: done- 1; 

. ■■ > '. ■ ■■■■■ '^^y-^-v- v ■■ ■ 

//brk: asm int 3 

if (debugging) {sprintf(s,"%s'\path); out(s);} 

} ; ' , ■ ■ ."" ■■; vv"^>';--- v.V'--. 

the following is [supposed to?] work at the debug dot prompt 
Use Soft-ICEAS/ or \VDEB3 86 to interact with this VxD. 

Function 

dgets - get a string from the debug console 
Input 

buf buffer to receive string 

maxchar niimber of bytes not including terminating nul that 
buf can accommodate 

Returns 

Returns the number of characters read 

console input is terminated by a CR. changed to NUL 

int dgets(char* buf, int maxchar) { int i; 
for (1=0; i < maxchar; 1-^^+, buf+-f) 
{ WORD ch; 

♦buf = ch = In_Debug_Chr() & Oxff; 
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if ( ((ch & Oxff) 13) |j ((ch & OxffDO) = OxffDO) ) 
brcEik; 

// mishandles backspace, other cU keys 

else Out_Debug_Chr((B YTE)Cch & Oxf!)); 

. "} ■ 
♦buf=-0; 

return i; // number of chars read 

"} : "■■' 

// Function 

// OnDebugQuery 
// Remarks 

// Allows interactive control 

// Invoke this by typing .drivemame at the debugger prompt 

VOID OnDebugQueryO { 

CHAR buflSO]; INT index; BYTE statreg; 

dprintfC'Enter on/off state [0=on; else on]: "); 
dgets(buf, sizeof(buf)-l); 

sscanf(buf, "%d", &index); 
//running = index; 

to run: just add the line *device=c:\directory\losO,vxd' after [386Enh] 
in \wind0ws\5ystem.im 

to alter the behavior: 
1 : go to a DOS prompt 

2: issue a DOS command which tries to open a special, nonexistant 
file name in the following fonn: 

c:V> type \#glxxxx 
The driver will see the attempted open of \#glxxxx and interpret the xxxx 
as a command. Some examples: 

To disable any prereading: 
type\#glrO 

To [re]enable prereading: 
type\#glrl 
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To begin to emit debugging lafonnation: 
type\#gldl 

To use table of maximum allowable readahead depending on current cache size- 
type \#glcO . 

[this reduces prereading and possible thrashing when cache size 
becomes too small. It is the default] 

To change the maximum preread amount: [ignoring cache size] 
type \#glcN 

where N is from 1 to 21 . N=21 sets max=2 meg. N=20 for max 1 meg, / 
... down to 16 normally [64k preread] 

To-change the minimum preread amount: 

type \#glmN 
where N is as above. 

The driver detects when the prereading is not justified by a low 
average miss rate, for example when random reads are occurring. The 
readahead size is then reduced. Conversely, when good performance 
is observed, the readahead size is increased. 

To change the miss rate sensitivity, 
type \#glsN 

Where N=0 to 7, 0 is fastest response; 4 is default 

To set aside memory [for testing reduced memory configurations} 
type\#glxN 

where N=nhe number of megs of ram that will be reserved. N==0 is default. 
If unsucessfijl, a fail message is logged. 

To cause log data to be written to c:\gI#nn.log 

type \//GlDlLN 
files written will have consecutive W numbers starting with 0. 
They are normally written out after about 50k: of log data have been accumulated. 
Data will be included in the log file normally after every 250 reads. 
If N is not 1, then data will be included every N reads. 
To flush out whatever data has been accumulated, writing out a new 
log file: 

type\#glV 

The driver commands can all be combined, as in: 

type\#gld0cl9rlml9x0 
to always read ahead 512k, no debugging, no memory wasted. The 0 is optional, 
so the following is equivalent. 
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type\#gldcl9ml9x 

To stop ihe logging and write out the remaining data, 
type \#G1L 

To emit configuration info [if logging or debugging] 
type \#glq 

To clear data and start from 0. 
type \#glz 

When a situation is observed that merits investigation, try 

type\#gld21100vq2 
to take a look at the current data every 100 reads, calibrate, reset counters 

type \#glvl 

to include the final data, and close the log file. 

[ to mislead zd Benchmark by forcing all opens to be cached, (like 
Intel's driver does). 

include a 'br in the filename. *bO* resets.] 
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CLAIMS 

1 . A method of reading data in a computer system, the computer system 
including a storage device and a cache in communication with the storage device, the method 
comprising: 

(a) tracking a cache hit rate of the computer system; 

(b) detecting a request for data which is immediately requested by the cpmputer 
system but which is not currently present in the cache; 

(c) formulating a read request to obtain the requested data from the storage 

device; and 

(d) dynamically sizing the read request based upon the current cache hit rate. 

2. A method according to claim 1 wherein in step (c), the size of the read request 
is at least initially greater than the size of the immediately requested data. 

3. A method according to claim 2 wherein the initial size of the read request is 
about 1 megabyte greater than the size of the immediately requested data. 

4. A method according to claim 2 wherein in step (c), the read request is a read- 
ahead request formulated to obtain the immediately requested data, plus additional data which is 
not immediately requested and which is located adjacent to the immediately requested data. 

5. A method according to claim 2 wherein in step (c), the read request is a read- 
around request formulated to obtain the immediately requested data, plus additional data which is 
not immediately requested and which is located before and after the immediately requested data. 

6. A method according to claim 1 wherein in step (d), the size of the read request 
is related to the cache hit rate in a manner such that the size of the read request is reduced as the 
cache hit rate declines, and the size of the read request is increased as the cache hit rate increases. 
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: 7. A method according to claim 6 wherein step (a) includes tracking a short-term 
and long-term cache hit rate, the size of the read request in step (d) being further dependent upon 
the short-term cache hit rate and the long-term cache hit rate, the short-term hit rate being used as 
the current cache hit rate to determine the reduction in the size of the read request, and the long- 
■ term cache hit rate being used as the current cache hit rate to determine the increase in the size of 
the read request. 

8. A method according to claim 1 wherein the computer system runs an 
application program which makes first read requests whenever it needs data to execute the 
program, the method further comprising: 

(e) receiving the first read request in a cache enhancer and in a cache manager, 
the cache manager including cache; 

(f) providing the requested data to the application program from the cache if the 
requested data is currently in the cache; and 

(g) formulating a second read request by the cache enhancer if the requested data 
is detected as not currently being in the cache, wherein the second read request is the 

dynamically sized read request of steps (c) and (d). 

9. A method according to claim 8 wherein the detecting in step (g) is performed 
by monitoring the first read request response time. 

10. A method according to claim 1 wherein the cache is divided into pages and 
the data includes pages of one or more program instructions and/or pages of data used by the one 

more program instructions, and step (b) includes detecting a page fault. 



or 



11. A method according to claim 1 wherein the storage device is a disk and the 
cache is a disk cache. 
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12. A computer-readable medium whose contents cause a computer to read data 
in a computer system, the computer system including a storage device and a cache in 
communication with the storage device, by performing the steps of: 

(a) tracking a cache hit rate of the computer system; 

(b) detecting a request for data which is immediately requested by the computer 

system but which is not currently present in the cache; ; 

(c) formulating a read request to obtain the requested data from the storage ; 

device; and 

(d) dynamically sizing the read request based upon the current cache hit rate. 

13. The computer-readable medium of claim 1 2 wherein in step (c), the size of 
the read request is at least initially greater than the size of the immediately requested data. 

14. The computer-readable medium of claim 13 wherein the initial size of the 
read request is about 1 megabyte greater than the size of the immediately requested data. 

15. The computer-readable medium of claim 13 wherein in step (c), the read 
request is a read-ahead request formulated to obtain the immediately requested data, plus 
additional data which is not immediately requested and which is located adjacent to the 
immediately requested data. 

16. The computer-readable medium of claim 1 3 wherein in step (c), the read 
request is a read-around request formulated to obtain the immediately requested data, plus 
additional data which is not immediately requested and which is located before and after the 
immediately requested data. 

17. The computer-readable medium of claim 12 wherein in step (d), the size of 
the read request is related to the cache hit rate in a manner such that the size of the read request is 
reduced as the cache hit rate declines, and the size of the read request is increased as the cache hit 
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rate increases. 

18. The computer-readable medium of claim 17 wherein step (a) M 
tracking a short-term and long-term cache hit rate, the size of the read request in step (d) being 
further dependent upon the short-term cache hit rate and the long-term cache hit rate, the short- 
term hit rate being used to determine the reduction in the size of the read request, and the long- 
term cache hit rate being used to determine the increase in the size of the read request. 

19. The computer-readable medium of claim 12 wherein the computer system 
runs an application program which makes first read requests whenever it needs data to execute 
the program, the method further comprising: 

(e) receiving the first read request in a cache enhancer and in a cache manager, 
the cache manager including cache; 

(f) providing the requested data to the application program from the cache if the 
requested data is currently in the cache; and 

(g) formulating a second read request by the cache enhancer if the requested data 
is detected as not currently being in the cache, wherein the second read request is the 
dynamically sized read request of steps (c) and (d). 

20. The computer-readable medium of claim 19 wherein the detecting in step (g) 
is performed by monitoring the first read request response time. 

21. The computer-readable medium of claim 12 wherein the cache is divided into 
pages and the data includes pages of one or more program instructions and/or pages of data used 
by the one or more program instructions, and step (b) includes detecting a page fault. 

22. The computer-readable medium of claim 12 wherein the storage device is a 
disk and the cache is a disk cache. 
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23. An apparatus for reading data in a computer system, the computer system 
including a storage device and a cache in communication with the storage device, the apparatus 
comprising: 

(a) means for tracking a cache hit rate of the computer system; 

(b) means for detecting a request for data which is immediately requested by the 
computer system but which is not currently present in the cache; 

(c) means for formulating a read request to obtain the requested data from the 
storage device; and 

(d) means for dynamically sizing the read request based upon the current cache 

hit rate. 

24. An apparatus according to claim 23 wherein the size of the read request is at 
least initially greater than the size of the immediately requested data. 

25. An apparatus according to claim 24 wherein the initial size of the read request 
is about 1 megabyte greater than the size of the immediately requested data. 

26. An apparatus according to claim 24 wherein the means for formulating a read 
request formulates a read-ahead request to obtain the immediately requested data, plus additional 
data which is not immediately requested and which is located adjacent to the immediately 
requested data. 

27. An apparatus according to claim 24 wherein the means for formulating a read 
request formulates a read-around request to obtain the immediately requested data, plus 
additional data which is not immediately requested and which is located before and after the 
immediately requested data. 

28. An apparatus according to claim 23 wherein the means for dynamically sizing 
the read request relates the read request to the cache hit rate in a manner such that the size of the 
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read request is reduced as the cache hit rate declines, and the size of the read request is increased 
as the cache hit rate increases. 

29. An apparatus according to claim 28 wherein the means for tracking a cache 
hit rate includes means for tracking a short-term and long-term cache hit rate, the size of the read 
request being further dependent upon the short-term cache hit rate and the long-term cache hit 
rate, the short-term hit rate being used to determine the reduction in the size of the read request, 
and the long-term cache hit rate being used to determine the increase in the size of the read 
request. 

30. An apparatus according to claim 23 wherein the computer system runs an 
application program which makes first read requests whenever it needs data to execute the 
program, the apparatus further comprising: 

(e) means for receiving the first read request in a cache enhancer and in a cache 
manager, the cache manager including cache; 

(f) means for providing the requested data to the application program from the 
cache if the requested data is currently in the cache; and 

(g) means for formulating a second read request by the cache enhancer if the 
requested data is detected as not currently being in the cache, wherein the second read request is 
the dynamically sized read request. 

31. An apparatus according to claim 30 wherein the means for detecting performs 
the detection by monitoring the first read request response time. 

32. An apparatus according to claim 25 wherein the cache is divided into pages 
and the data includes pages of one or more program instructions and/or pages of data used by the 
one or more program instructions, and the means for detecting a request includes means for 
detecting a page fault. 



wo 99/34356 PCT/US98/27417 

33. An apparatus according to claim 25 wherein the storage device is a disk and 
the cache is a disk cache. 
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